Synthesis of visible speech
نویسندگان
چکیده
Given the importance of visible information in face-toface communication, visible speech synthesis is being developed to control and manipulate visible speech. Experiments have shown that this visible speech is particularly important when the auditory speech is degraded, because of noise, bane:width filtering, or hearing impairment (Massaro, 1987). The strong influence of visible speech is not limited to situations with degraded auditory input, however; it occurs even when visible speech is paired with perfectly intelligible speech sounds. The influence of visible speech is easily experienced in a demonstration of a McGurk effect (McGurk & MacDonald, 1976). A videotape of a person making a visible labial articulation Ipa-pal is dubbed with the alveolar nasal speech sounds Ina-na/. This dubbed speech event gives a situation in which intelligible auditory speech is paired with a contradictory visual articulation. A strong effect of the visual source of information is observed, with the viewer often reporting hearing the labial nasal lma-mal. It can be shown that the viewer's experience is influenced by both the audible and the visible speech. The value of visible speech in speech perception warrants that visible speech should be studied, just as auditory speech has been studied. Although some progress has been made using natural speech as stimuli, the synthesis of a speaker's face permits a better controlled and more systematic analysis of the perceptual process. Given that synthetic auditory speech has proven valuable for the study of auditory speech perception, visible speech synthesis should be a valuable tool for the study of visual and bimodal (auditory-visual) speech perception. In addition to enabling exact control over the speech stimulus, synthetic speech allows the creation of novel speech segments that are not easy to produce naturally. Presenting novel stimuli to subjects in psychophysical tasks provides important information about the processes involved in perception.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملInfluenсe of Phone-Viseme Temporal Correlations on Audiovisual STT and TTS Performance
In this paper, we present a research of temporal correlations of audiovisual units in continuous Russian speech. The corpus-based study identifies natural time asynchronies between flows of audible and visible speech modalities partially caused by inertance of the articulation organs. Original methods for speech asynchrony modeling have been proposed and studied using bimodal ASR and TTS system...
متن کاملTraining Baldi to be multilingual: A case study for an Arabic Badr
In this paper, we describe research to extend the capability of an existing talking head, Baldi, to be multilingual. We use parsimonious client/server architecture to impose autonomy in the functioning of an auditory speech module and a visual speech synthesis module. This scheme enables the implementation and the joint application of text-to-speech synthesis and facial animation in many langua...
متن کاملHighly efficient synthesis of tetrahydrobenzo[b]pyrans under visible light promoted by cesium carbonate
Multi-component coupling reaction (MCR) is a powerful synthetic tool for the synthesis of biologically active compounds. Development of such multi-component coupling reaction strategies in visible light has been of considerable interest, as they provide simple and rapid access to a large number of organic molecules through a sustainable path. An efficient and green protocol for the synthesis of...
متن کاملA Multimodal Approach to Audiovisual Text-to-Speech Synthesis
Oral speech has always been the most important means of communication between humans. When a message is conveyed using oral speech, it is encoded in two separate signals: an auditory speech signal and a visual speech signal. The auditory speech signal consists of a series of speech sounds that are produced by the human speech production system. In order to generate different sounds, the paramet...
متن کامل